Ludwig - Papers: Computer Science - Computer Vision and Pattern Recognition

Continual Learning: Fast and Slow

Quang Pham, et al. • (2022) • DOI: 10.48550/arXiv.2209.02370

According to the Complementary Learning Systems (CLS) theory~\cite{mcclelland1995there} in neuroscience, humans do effective \emph{continual learning} through two complementary systems: a fast learnin...

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, et al. • (2025) • DOI: 10.48550/arXiv.2507.15857

Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promis...

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Michael M. Bronstein, et al. • • (2021) • DOI: 10.48550/arXiv.2104.13478

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to b...

Emerging Properties in Unified Multimodal Pretraining

Chaorui Deng, et al. • • (2025) • DOI: 10.48550/arXiv.2505.14683

Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open-source foundational model that nativel...

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, et al. • • (2025) • DOI: 10.48550/arXiv.2504.13837

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and ...

Visual Planning: Let's Think Only with Images

Yi Xu, et al. • • (2025) • DOI: 10.48550/arXiv.2505.11409

Recent advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have substantially enhanced machine reasoning across diverse tasks. However, these models predominantly rely...

The Platonic Representation Hypothesis

Minyoung Huh, et al. • • (2024) • DOI: 10.48550/arXiv.2405.07987

We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways...

Learning high-level visual representations from a child's perspective without strong inductive biases

A. Emin Orhan, Brenden M. Lake • • (2023) • DOI: 10.48550/arXiv.2305.15372

Young children develop sophisticated internal models of the world based on their visual experience. Can such models be learned from a child's visual experience without strong inductive biases? To inve...

TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems

Khang H. N. Vo, et al. • • (2025) • DOI: 10.48550/arXiv.2503.06380

This paper focuses on multimodal alignment within the realm of Artificial Intelligence, particularly in text and image modalities. The semantic gap between the textual and visual modality poses a disc...

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Mahmoud Assran, et al. • • (2023) • DOI: 10.48550/arXiv.2301.08243

This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Archi...

Computer Science - Computer Vision and Pattern Recognition

Subcategories

Papers